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Abstract 

We  consider  two-player  infinite  games  played  on  graphs.  The  games 
are  concurrent,  in  that  at  each  state  the  players  choose  their  moves 
simultaneously  and  independently,  and  stochastic,  in  that  the  moves 
determine  a  probability  distribution  for  the  successor  state.  The  value 
of  a  game  is  the  maximal  probability  with  which  a  player  can  guarantee 
the  satisfaction  of  her  objective.  We  show  that  the  values  of  concurrent 
games  with  w-regular  objectives  expressed  as  parity  conditions  can  be 
computed  in  NP  n  coNP.  This  result  substantially  improves  the  best 
known  previous  bound  of  3EXPTIME.  It  also  shows  that  the  full  class 
of  concurrent  parity  games  is  no  harder  than  the  special  cases  of  turn- 
based  deterministic  parity  games  (Emerson-Jutla)  and  of  turn-based 
stochastic  reachability  games  (Condon) ,  for  both  of  which  NP  n  coNP 
is  the  best  known  bound. 

While  the  previous,  more  restricted  NP  n  coNP  results  for  graph 
games  relied  on  the  existence  of  particularly  simple  (pure  memoryless) 
optimal  strategies,  in  concurrent  games  with  parity  objectives  optimal 
strategies  may  not  exist,  and  e-optimal  strategies  (which  achieve  the 
value  of  the  game  within  a  parameter  e  >  0)  require  in  general  both 
randomization  and  infinite  memory.  Hence  our  proof  must  rely  on  a 
more  detailed  analysis  of  strategies  and,  in  addition  to  the  main  result, 
yields  two  results  that  are  interesting  on  their  own.  First,  we  show 
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that  there  exist  e-optimal  strategies  that  in  the  limit  coincide  with 
memoryless  strategies;  this  parallels  the  celebrated  result  of  Mertens- 
Neyman  for  concurrent  games  with  limit-average  objectives.  Second, 
we  complete  the  characterization  of  the  memory  requirements  for  e- 
optimal  strategies  for  concurrent  w-regular  games,  by  showing  that 
memoryless  strategies  suffice  for  £-optimality  for  coBiichi  conditions. 


1  Introduction 

We  consider  recursive  games  played  between  two  players  over  a  graph  [22, 
10,  16].  The  games  proceed  in  an  infinite  number  of  rounds.  At  each  round, 
the  players  choose  moves;  the  two  moves,  together  with  the  current  state, 
determine  a  probability  distribution  for  the  successor  state.  An  outcome  of 
the  game,  or  play,  consists  in  the  infinite  sequence  of  states  visited.  These 
graph  games  can  be  broadly  classified  into  turn-based,  and  concurrent  games. 
In  turn-based  games,  in  any  given  round  only  one  player  can  choose  among 
multiple  moves:  effectively,  the  set  of  states  of  the  graph  can  be  partitioned 
into  the  states  where  it  is  player  l’s  turn  to  play,  and  the  states  where 
it  is  player  2’s  turn  to  play.  In  concurrent  games,  both  players  may  have 
multiple  moves  available  at  each  state,  and  the  players  choose  their  moves 
simultaneously  and  independently. 

An  important  class  of  winning  conditions  are  the  co-regular  languages. 
In  such  games,  the  goal  of  player  1  is  to  ensure  that  the  play  belongs  to 
a  specified  w-regular  language;  the  goal  of  player  2  is  to  ensure  that  the 
play  does  not  belong  to  the  language.  The  games  are  thus  zero-sum:  the 
objectives  of  the  two  players  are  complementary.  The  cu-regular  languages 
are  the  generalization  to  infinite  words  of  the  classical  regular  languages  [24] ; 
the  properties  expressible  by  w-regular  languages  include  safety,  reachability, 
and  fairness.  Games  with  w-regular  winning  conditions  have  been  applied  to 
system  synthesis  [2,  21,  19]  and  verification  [9,  13,  7].  Of  particular  interest 
are  co  regular  languages  that  are  given  as  parity  conditions  on  game  graphs; 
this  is  because  every  w-regular  game  can  be  converted  into  a  parity  game 
[18,  25,  26]. 

Given  a  recursive  game  and  an  w-regular  language  C,  the  value 
((f)) vai{£)(s)  of  the  game  for  player  1  at  a  state  s  is  equal  to  the  maxi¬ 
mal  probability  with  which  player  1  can  ensure  that  the  play  lies  in  £;  the 
value  «2})m/(£)(s)  of  the  game  for  player  2  at  s  is  equal  to  the  maximal  prob¬ 
ability  with  which  player  2  can  ensure  that  the  play  lies  outside  C.  Martin’s 
determinacy  theorem  ensures  that  ((l))vai(C)(s)  +  ((2))vai(C)(s)  =  1  [15].  Ex¬ 
cept  for  the  special  case  of  turn-based  games,  little  has  been  known  about 
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the  computational  complexity  of  finding  the  value  for  a  recursive  game  with 
an  cu-regular  winning  condition.  In  the  turn-based  case,  it  is  known  that  the 
value  of  games  with  w-regular  conditions  can  be  computed  in  NP  n  coNP. 
This  result  was  first  obtained  for  turn-based  deterministic  parity  games, 
in  which  each  moves  determines  uniquely  (instead  of  probabilistically)  the 
successor  state  [9],  and  for  turn-based  stochastic  reachability  games  [5];  the 
case  of  turn-based  stochastic  parity  games  was  shown  in  [3] . 

Concurrent  games  are  substantially  more  complex  than  turn-based 
games  in  several  respects.  To  see  this,  consider  the  structure  of  optimal 
strategies,  which  are  strategies  that  achieve  the  value  of  a  given  game.  For 
turn-based  stochastic  cu-regular  games,  there  always  exist  pure  (determinis¬ 
tic)  optimal  strategies,  which  do  not  rely  on  randomized  choice  [3];  in  the 
case  of  turn-based  stochastic  parity  games,  moreover,  there  are  always  pure 
memoryless  optimal  strategies,  where  the  choice  of  move  depends  only  on 
the  current  state,  rather  than  also  on  the  past  history  of  the  game.  It  is  this 
observation  that  led  to  the  NP  D  coNP  result  for  turn-based  parity  games. 

By  contrast,  in  concurrent  games,  already  for  reachability  conditions, 
players  must  in  general  play  with  randomized  (non-pure)  strategies,  which 
prescribe,  at  each  round,  a  probability  distribution  over  the  moves  to  be 
played.  Furthermore,  optimal  strategies  may  not  exist:  rather,  for  every 
real  e  >  0,  the  players  have  e-optimal  strategies,  which  achieve  the  value 
of  the  game  within  e.  Even  for  relatively  simple  winning  conditions,  such 
as  Biichi  conditions,  e-optimal  strategies  need  both  randomization  and  in¬ 
finite  memory  [8].  It  is  therefore  not  inconceivable  that  the  complexity  of 
concurrent  ^-regular  games  might  be  considerably  worse  than  NP  n  coNP. 
The  only  known  previous  algorithm  for  computing  the  value  of  concurrent 
parity  games  is  triple-exponential  [8] :  it  was  obtained  via  a  reduction  to  the 
theory  of  the  real  closed  field,  by  using  decision  procedures  for  the  theory 
of  reals  with  addition  and  multiplication  [23,  1]. 

In  this  paper,  we  show  that  the  problem  of  computing  the  value  of  a 
concurrent  parity  game  is  in  NP  n  coNP.  More  precisely,  as  the  value  of 
a  concurrent  game  at  a  state  can  be  an  irrational  number,  we  show  that 
given  an  encoding  of  the  game  and  of  a  rational  e  >  0,  the  problem  of 
approximating  the  value  of  the  game  within  e  can  be  solved  in  NP  fi  coNP. 
This  result  generalizes  the  best  known  upper  bound  (NP  fi  coNP)  for  very 
restricted  cases,  such  as  turn-based  deterministic  parity  games  and  turn- 
based  stochastic  reachability  games,  to  the  class  of  all  concurrent  parity 
games. 

The  basic  idea  behind  the  proof,  which  can  no  longer  rely  on  the  existence 
of  pure  memory  less  optimal  strategies,  is  as  follows.  We  call  a  value  class  a 
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maximal  set  of  states  where  the  game  has  the  same  value  for  player  1 .  By  the 
results  of  [6]  on  qualitative  winning  (i.e.,  winning  with  probability  1),  if  the 
(player  1)  value  of  the  game  is  not  constant  1  or  0,  then  there  are  two  non¬ 
empty  value  classes  W\  and  Wi  where  the  value  is  1  and  0,  respectively.  We 
show  that  if  the  players  play  e-optimal  strategies,  then  W \  U  W2  is  reached 
with  probability  1.  Through  a  detailed  analysis  of  the  branching  structure  of 
the  stochastic  process  of  the  game,  we  go  on  to  show  that  we  can  construct  a 
e-optimal  strategy  by  stitching  together  strategies,  one  per  each  value  class. 
This  gives  us  a  polynomial  witness  for  the  resulting  strategy  and  proves 
membership  in  NP;  membership  in  NP  n  coNP  follows  from  the  fact  that 
the  problem  is  symmetrical  in  players  1  and  2. 

A  detailed  analysis  of  our  proof  gives  us  several  new  results  about  the 
structure  of  e-optimal  strategies  in  concurrent  parity  games.  First,  we  show 
that  concurrent  games  with  coBiichi  winning  conditions  admit  memoryless  e- 
optimal  strategies.  This  result  completes  the  characterization  of  the  memory 
requirements  of  the  optimal  strategies  for  concurrent  ^-regular  games:  it  was 
previously  known  that  safety  and  reachability  games  admit  memoryless  e- 
optimal  strategies  [11,  8],  and  that  Biichi  conditions  may  require  infinite 
memory  [8].  Second,  we  show  that  in  concurrent  parity  games,  the  limit 
of  the  e-optimal  strategies  for  e  — >  0  is  a  memoryless  strategy  (which  in 
general  is  not  optimal) .  This  result  parallels  the  celebrated  result  of  Mertens- 
Neyrnan  [17]  for  concurrent  games  with  limit-average  objectives. 

2  Definitions 

Notation.  For  a  countable  set  A,  a  probability  distribution  on  A  is  a  func¬ 
tion  A :  ^4  1 — >  [0, 1]  such  that  ^(a)  =  1-  We  denote  the  set  of  probability 

distributions  on  A  by  T>(A).  Given  a  distribution  6  £  T>(A),  we  denote  by 
Supp((5)  =  {x  £  A  |  5(x)  >  0}  the  support  of  5. 

Definition  1  (Concurrent  Games)  A  (two-player)  concurrent  game 
structure  G  =  (S',  Moves,  Ti,T2,  5)  consists  of  the  following  components: 

•  A  finite  state  space  S. 

•  A  finite  set  Moves  of  moves. 

•  Two  move  assignments  :  S  i-»  2Moves  \  0.  For  i  £  {1,2}, 

assignment  T*  associates  with  each  state  s  £  S  the  non-empty  set 
r.j(s)  C  Moves  of  moves  available  to  player  i  at  state  s. 
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•  A  probabilistic  transition  function  5  :  S  x  Moves  x  Moves  — >  T>(S), 
that  gives  the  probability  S(s,  ai,  a2)(i)  of  a  transition  from  s  to  t  when 
player  1  plays  a\  and  player  2  plays  move  a2,  for  all  s,t  £  S  and 

ai  er^s),  a2  e  r2(s).  ■ 

We  distinguish  the  following  special  classes  of  concurrent  game  structures. 

•  A  concurrent  game  structure  G  is  deterministic  if  for  all  s  E  S  and  all 
ai  £  r^s),  a2  £  r2(s),  there  is  a  t  G  5  such  that  <J(s,ai,  a2)(t)  =  1. 

•  A  concurrent  game  structure  G  is  turn-based  if  at  every  state  at  most 
one  player  can  choose  among  multiple  moves;  that  is,  if  for  every  state 
s  £  S  there  exists  at  most  one  i  £  {1,2}  with  |r*(s) |  >  1. 

We  define  the  size  of  the  game  structure  G  to  be  equal 
to  the  size  of  the  transition  function  5;  specifically,  |G|  = 

EsesEaer1(s)EbGr2(s)Etesl,5('s>a^)(i)l>  where  \${s,a,b){t)\  denotes  the 
space  to  specify  the  probability  distribution.  We  write  n  to  denote  the  size 
of  the  state  space,  i.e.,  n  =  |£|.  At  every  state  s  £  S,  player  1  chooses  a 
move  a\  £  Ti(s),  and  simultaneously  and  independently  player  2  chooses 
a  move  a2  £  r2(s).  The  game  then  proceeds  to  the  successor  state  t  with 
probability  <5(s,  a\,  a2)(t),  for  all  t  £  S.  A  state  s  is  called  an  absorbing 
state  if  for  all  ai  £  Ti(s)  and  a2  £  T2(s)  we  have  S(s,  a\,  a2)(s)  =  1.  In 
other  words,  at  s  for  all  choice  of  moves  of  the  players  the  next  state  is 
always  s.  A  state  s  is  a  turn-based  state  if  there  exists  i  £  {  1,2  }  such 
that  |Tj(s)|  =  1.  Moreover,  if  |T2(s)|  =  1  then  the  state  s  is  a  player- 
1  turn-based  state  since  the  choice  of  moves  for  player  2  is  trivial;  and  if 
|T i  (s)  |  =  1  then  it  is  a  player-2  turn-based  state.  We  assume  that  the  players 
act  non-cooperatively,  i.e.,  each  player  chooses  her  strategy  independently 
and  secretly  from  the  other  player,  and  is  only  interested  in  maximizing  her 
own  reward.  For  all  states  s  £  S  and  moves  ai  £  ri(s)  and  a2  £  r2(s),  we 
indicate  by  Dest(s,  a\,  a2)  =  Supp(<5(s,  a\,  a2))  the  set  of  possible  successors 
of  s  when  moves  a\,  a2  are  selected. 

A  path  or  a  play  uj  of  G  is  an  infinite  sequence  ui  =  (so,  «i,  s2, . . .}  of  states 
in  S  such  that  for  all  k  >  0,  there  are  moves  a\  £  ri(sfc)  and  a\  £  r2(sfc) 
with  5(sk,  a\,  a^Xs/c+i)  >  0.  We  denote  by  fl  the  set  of  all  paths  and  by 
the  set  of  all  paths  ui  =  (so,  si,  s 2, . . .)  such  that  so  =  s,  i.e.,  the  set  of  plays 
starting  from  state  s. 
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2.1  Randomized  strategies 

A  selector  £  for  player  i  £  {1, 2}  is  a  function  V(Moves)  such  that  for 

all  s  £  S  and  a  £  Moves,  if  £(s)(a)  >  0  then  a  £  Fj(s).  We  denote  by  A*  the 
set  of  all  selectors  for  player  z  £  {1,2}.  A  selector  £  is  pure  if  for  every  s  £  S 
there  is  a  £  Moves  such  that  £(s)(a)  =  1;  we  denote  by  A f  C  A*  the  set  of 
pure  selectors  for  player  i.  A  strategy  for  player  1  is  a  function  a  :  S+  — >  Ai 
associates  with  every  finite  non-empty  sequence  of  states,  representing  the 
history  of  the  play  so  far,  a  selector.  Similarly  we  define  strategies  n  for 
player  2.  A  strategy  a  for  player  i  is  pure  if  it  yields  only  pure  selectors, 
that  is,  is  of  type  S+  — >  Af .  A  strategy  with  memory  can  be  described  as 
a  pair  of  functions:  (a)  memory  update  function  au  :  S  x  M  —>  M,  and  (b) 
next  move  function  am  :  S  x  M  — >  A\.  A  strategy  with  memory  is  finite 
memory  if  M  is  finite.  A  memoryless  strategy  is  independent  of  the  history 
of  the  play  and  depends  only  on  the  current  state.  Memoryless  strategies 
coincide  with  selectors,  and  we  often  write  a  for  the  selector  corresponding 
to  a  memoryless  strategy  a.  A  strategy  is  pure  memory  less  if  it  is  pure  and 
memoryless.  We  denote  by  T,p  ,T,P  ,T,PM  the  family  of  pure,  finite-memory 
and  pure  nrenroryless  strategies  for  player  1  respectively.  Analogously  we 
define  the  families  of  strategies  for  player  2.  We  denote  by  £  and  II  the  set 
of  all  strategies  for  player  1  and  player  2,  respectively. 

Once  the  starting  state  s  and  the  strategies  a  and  it  for  the  two  players 
have  been  chosen,  the  game  is  reduced  to  an  ordinary  stochastic  process. 
Hence,  the  probabilities  of  events  are  uniquely  defined,  where  an  event  A  C 

is  a  measurable  set  of  paths.  For  an  event  d  C  fis,  we  denote  by  Pr^’^A) 
the  probability  that  a  path  belongs  to  A  when  the  game  starts  from  s  and 
the  players  follows  the  strategies  a  and  it.  For  i  >  0,  we  also  denote  by 
©i  :  ^  5*  the  random  variable  denoting  the  i-th  state  along  a  path. 

2.2  Objectives 

We  specify  objectives  for  the  players  by  providing  the  set  of  winning  plays 
(pCH  for  each  player.  In  this  paper  we  study  only  zero-sum  games  [20,  11], 
where  the  objectives  of  the  two  players  are  strictly  competitive.  In  other 
words,  it  is  implicit  that  if  the  objective  of  one  player  is  <b,  then  the  objective 
of  the  other  player  is  H  \  <h.  Given  a  game  graph  G  and  an  objective  $CH, 
we  write  ( G ,  <h)  for  the  game  played  on  the  graph  G  with  the  objective  $ 
for  player  1. 

A  general  class  of  objectives  are  the  Borel  objectives  [12].  A  Borel 
objective  $  C  Su  is  a  Borel  set  in  the  Cantor  topology  on  S'A  In  this 
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paper  we  consider  tv-regular  objectives  [26],  which  lie  in  the  first  2  */2  lev¬ 
els  of  the  Borel  hierarchy  (i.e. ,  in  the  intersection  of  S3  and  II3).  The 
w-regular  objectives,  and  subclasses  thereof,  can  be  specified  in  the  follow¬ 
ing  forms.  For  a  play  tv  =  (so,  si,  s  2, . . .)  6  O,  we  define  Inf  (a;)  =  {  s  €  S  \ 
Sk  =  s  for  infinitely  many  k  >  0}  to  be  the  set  of  states  that  occur  infinitely 
often  in  tv. 

•  Reachability  and  safety  objectives.  Given  a  set  T  C  S  of  “tar¬ 
get”  states,  the  reachability  objective  requires  that  some  state  of  T 
be  visited.  The  set  of  winning  plays  is  thus  Reach(T)  =  {  tv  = 
(so,si,S2,  ■  ■  •}  G  |  Sk  €  T  for  some  k  >  0  }.  Given  a  set  F  C  S, 
the  safety  objective  requires  that  only  states  of  F  be  visited.  Thus, 
the  set  of  winning  plays  is  Safe(F)  =  {  tv  =  (so,  si,  «2,  •  ■  •)  £  H  |  s^  £ 
F  for  all  k  >  0  } . 

•  Biichi  and  coBiichi  objectives.  Given  a  set  B  C  S  of  “Biichi”  states,  the 
Biichi  objective  requires  that  B  is  visited  infinitely  often.  Formally,  the 
set  of  winning  plays  is  Biichi(B)  =  { tv  G  fi  |  Inf(o;)  fl  B  7^  0  }.  Given 
CCS,  the  coBiichi  objective  requires  that  all  states  visited  infinitely 
often  are  in  C.  Formally,  the  set  of  winning  plays  is  coBiichi(C')  = 
{wGfi|  Inf(w)  C  C}. 

•  Parity  objective.  For  c,  d  £  N,  we  let  [c..d\  =  {  c,  c  +  1, . . . ,  d  }.  Let 
p  :  S  [0..d]  be  a  function  that  assigns  a  priority  p(s )  to  every 
state  s  €  S,  where  d  G  N.  The  Even  parity  objective  is  defined  as 
Parity (p)  =  {  tv  €  fl  |  min  (Inf(w))  is  even  },  and  the  Odd  parity 
objective  as  coParity (p)  =  {tv  G  fl  |  min  (Inf(cu))  is  odd  }.  Informally 
we  say  that  a  path  tv  satisfy  the  parity  objective,  Parity (p),  if  v  E 
Parity  (p). 

Note  that  for  a  priority  function  p  :  V  — ►  {  0, 1  },  an  even  parity  objective 
Parity(p)  is  equivalent  to  the  Biichi  objective  Biichi(p_1(0)),  i.e.,  the  Biichi 
set  consists  of  the  states  with  priority  0. 

The  ability  to  solve  games  with  parity  objectives  suffices  for  solving 
games  with  arbitrary  LTL  (or  cu-regular)  objectives:  in  fact,  it  suffices  to 
encode  the  cu-regular  objective  as  a  deterministic  Rabin-chain  automaton 
or  parity  automaton,  solving  then  the  game  consisting  of  the  synchronous 
product  of  the  original  game  with  the  Rabin-chain  automaton  [18,  25]. 

Given  any  parity  winning  objective,  we  write  f 1e  to  denote  Parity (p); 
this  set  is  measurable  for  any  choice  of  strategies  for  the  two  players  [27]. 
Similarly  we  write  ilQ  to  denote  coParity  (p).  Note  that  f2e  n  fi0  =  0  and 
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VLe  U  fl0  =  fl.  Given  a  state  s  we  write  fles  to  denote  fls  fl  fle  and  similarly 
we  write  flos  to  denote  fls  fl  flG.  Hence,  the  probability  that  a  path  satisfies 
objective  Parity  (p)  starting  from  state  s  £  S,  given  strategies  a,  it  for  the 
players  is  Pr^,7r(Hes).  Given  a  state  s  E  S  and  a  parity  winning  objective, 
Parity  (p),  we  are  interested  in  finding  the  maximal  probability  with  which 
player  1  can  ensure  that  Parity  (p)  and  player  2  can  ensure  that  coParity  (p) 
holds  from  s.  We  call  such  probability  the  value  of  the  game  G  at  s  for 
player  i  E  {  1,2  }.  The  value  for  player  1  and  player  2  are  given  by  the 
function  ((l))m/(He)  :  S  i— >  [0,1]  and  ((2 ))vai(£l0)  :  S  i— >  [0,1],  defined  for  all 
s  E  S  by 

«l))t,a;(He)(s)  =  sup  inf  Pr^r’7r(Hes) 

((2))vai(no)(s)  =  sup  inf  Pr^(Hos). 

Trends 

Note  that  the  objectives  of  the  player  are  complementary  and  hence  we 
have  a  zero-sum  game.  Concurrent  games  satisfy  a  quantitative  version  of 
determinacy  [15],  stating  that  for  all  parity  winning  objectives,  and  all  s  E  S, 
we  have 

((1  }}val{^e){s)  +  ((2  ))Val{Qo)(s)  =  1- 

A  strategy  a  for  player  1  is  optimal  if  for  all  s  E  S  we  have 
inf  Pr^(Hes)  =  «1  ))vai(^e)(s). 

7rGll 

For  s  >  0,  a  strategy  a  for  player  1  is  e-optimal  if  for  all  s  E  S  we  have 

inf  Pr^(nes)  >  ((l))vai(Qe)(s)  -  e. 
iren 

We  define  optimal  and  e-optimal  strategies  for  player  2  symmetrically.  Note 
that  the  quantitative  determinacy  of  concurrent  games  is  equivalent  to  the 
existence  of  e-optimal  strategies  for  both  players,  for  all  e  >  0,  at  all  states 
s  €  S.  We  denote  by  «1  ))UmU  =  {  s  \  «1  ))vai(Qe)(s)  =  1  }  and  {{2))Umit  = 
{  s  |  ((2))val(no)(s)  =  1  },  the  set  of  states  where  player  1  and  player  2  have 
values  1,  respectively. 

2.3  The  branching  structure  of  plays 

Many  of  the  arguments  developed  in  this  paper  rely  on  a  detailed  analysis 
of  the  branching  process  resulting  from  the  strategies  chosen  by  the  players, 
and  from  the  probabilistic  transition  relation  of  the  game.  In  order  to  make 
our  arguments  precise,  we  need  some  definitions.  A  play  is  feasible  if  each  of 
its  transitions  could  have  arisen  according  to  the  transition  relation  of  the 
game. 


Definition  2  (Feasible  plays  and  outcomes)  Given  strategies  a  for 
player  1  and  it  for  player  2,  a  play  ui  =  (so,  «i,  S2>  ■  ■  ■)  is  feasible  in 

a  concurrent  game  graph  G,  if  for  every  k  £  N  the  following  condi¬ 

tions  hold:  (1)  s/c_i_i  £  Dest(sfc,  ai,  CI2);  (2)  <r(so,  si, . . . ,  Sfc)(ai)  >  0  and 
(3)  7r(so,  si,  . . . ,  Sfc)(d2)  >  0.  Given  strategies  a  £  S  and  ir  £  n,  and  a 
state  s,  we  denote  by  Outcome  (s,  a,  7r)  C  the  set  of  feasible  plays  that 
start  from  s,  given  strategies  a  and  ir.  I 

In  order  to  make  precise  statements  about  the  branching  process  arising 
from  a  game  play,  we  define  below  trees  labeled  by  game  states. 

Definition  3  (Infinite  trees,  5-labeled  trees  and  trees  for  events) 

An  infinite  tree  is  a  set  TV  C  N*  such  that 

•  if  x  ■  i  £  TV  where  x  £  N*  and  i  £  N  then  x  £  TV; 

•  for  all  x  £  TV  there  exists  i  £  N  such  that  x  ■  i  £  TV.  We  refer  to  x  ■  i 

as  a  successor  of  x. 

We  call  the  elements  in  TV  as  nodes  and  the  empty  word  e  is  the  root  of  the 
tree.  An  infinite  path  t  of  TV  is  a  set  r  C  TV  such  that 

•  e  £  r; 

•  for  every  x  in  r  there  is  an  unique  i  £  N  such  that  x-  i  £  r.  Note  that 
for  every  i  £  N,  there  is  an  unique  element  x  £  r  such  that  |x|  =  i. 
We  denote  by  Ti  the  element  x  £  r  such  that  |x|  =  i. 

Given  an  infinite  tree  TV  and  a  node  x  £  TV,  we  denote  by  Tr(.x)  the 
sub-tree  rooted  at  node  x.  Formally,  Tr(.x)  denotes  the  set  {  x'  £  TV  | 
x  is  a  prefix  of  x'  } . 

A  S-labeled  tree  T  is  a  pair  (TV,  (•)),  where  TV  is  a  tree  and  (■)  :  TV  — ►  S 
maps  each  node  of  TV  to  a  state  s  £  S.  Given  a  S-labelled  tree  T,  and  a 
infinite  path  r  C  TV,  we  denote  by  (r)  the  play  (sq,  s\,  S2,  ■  ■  •),  such  that 
sq  =  (e)  and  for  all  i  >  0  we  have  Si  =  (rj).  A  S-labeled  tree  Ts  =  (TVS,  (•)), 
where  (e)  =  s,  represents  a  set  of  infinite  paths,  denoted  as  C(TS)  C  IIS,  such 
that 


C(TS)  =  {  io  =  {sq  =  s,  sltS2, . . .)  £  fts  |  3t  C  Trs.  (r)  =  u}. 

A  S-labeled  tree  Ts  represents  an  event  A  C  if  and  only  if  C(TS)  =  A. 
We  denote  by  7 A,s  a  S-labeled  tree  that  represents  an  event  A  C  ftS)  and 
denote  by  Tr_4jS  the  tree  of  T^s .  I 
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Several  of  the  following  results  will  be  phrased  in  terms  of  the  S-labeled  tree 
Tj’j,  which  represents  the  outcomes  from  s  £  S  that  result  from  player  1 
using  strategy  a  and  player  2  using  strategy  7 r,  and  that  belong  to  a  specified 
event  A. 

Definition  4  (Trees  for  strategies)  Given  a  measurable  event  A,  strate¬ 
gies  a,  7 r,  a  state  s,  such  that  Pr^,7r(yl)  >  0,  we  denote  by  T a  S-labeled 
tree  to  represent  A  IT  Outcome(s,  a,  n),  and  we  also  denote  by  Tr^  the 
tree  of  Tj’j.  Given  strategy  cr,ir,  we  denote  by  the  S-labeled  tree 

Tfffi  ,  ,  ,  and  we  also  denote  by  Tr'h77  the  tree  of  Tff,7r .  I 

Notations.  Let  T  =  (TV,  (•))  be  a  S-labeled  tree  and  x  £  TV  such  that 
|a;|  =  n.  We  denote  by  Xi  the  prefix  of  x  of  length  i.  We  denote  by 
hist(.x)  =  ((e),  (xi), . . . ,  (xn)),  the  history  represented  by  the  path  from 
root  to  the  node  x.  We  denote  by  Cone(x)  =  {  u)  =  (so,  si,  S2,  ■  ■  ■ , )  | 
(xi)  =  Si  for  all  0  <  i  <  n  }  the  set  of  paths  with  the  prefix  hist(x).  Given 
a  measurable  event  A  C  f2s,  strategies  a  and  7r  such  that  Pr^,7r(^l)  >  0, 
consider  the  S-labeled  tree  to  represent  A  fl  Outcome(s,  a,  tt).  Con¬ 
sider  the  event  Anu  =  {  Cone(x)  |  x  £  Tr^7) .  Prg,7r(Cone(rc)  fl  A)  =  0  }. 
Since  Anu  is  the  countable  union  of  measurable  sets  each  with  measure  0 
we  have  PTcfi7T(Anu  IT  A)  =  0.  Hence,  in  sequel  without  loss  of  general¬ 
ity  given  any  event  A.  we  only  consider  the  event  A  \  Ami  and  by  a  lit¬ 
tle  abuse  of  notation  use  Tj’j  to  represent  the  stochastic  tree  7)^^  ^  s. 
Hence,  without  loss  of  generality  we  assume  for  any  x  £  TV^  we  have 
Prg,7r(Cone(.x)nyl)  >  0.  Henceforth,  for  any  x  £  Tr^7)  we  write  Pr \  A) 
to  denote  Pi |  Cone(.x),  „4). 

Definition  5  (Perennial  e-optimal  strategies)  For  all  £  >  0,  a  strategy 
a  is  a  perennial  e-optimal  strategy  for  player  1,  from  state  s,  if  for  all 
strategy  tt,  for  all  node  x  in  the  stochastic  tree  Tr^,7r,  we  have  PrJ,7r(Hes)  > 
((1  ))vai{Qe)((x))  —  £,  i.e.,  in  the  stochastic  sub-tree  rooted  at  x  player  1  is 
ensured  the  value  of  the  game  at  (x)  within  e-precision.  Perennial  e-optimal 
strategies  for  player  2  are  defined  analogously.  We  denote  by  and  n£  the 
set  of  perennial  e-optimal  strategies  for  player  1  and  player  2  respectively.  I 

The  e-optimal  strategies  constructed  for  parity  objectives  in  [8]  are 
perennial  e-optimal  strategies.  This  gives  us  the  following  Proposition. 

Proposition  1  For  all  e  >  0,  we  have  /  0  and  n£  0. 
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3  Games  with  Reachability  Objectives 


In  this  section  we  show  that  the  values  of  a  concurrent  parity  game  can 
be  related  to  the  e-Nash  equilibrium  of  a  non-zero  sum  reachability  game. 

This  generalizes  the  well-known  results  in  MDPs,  stating  that  for  all  parity 
objectives  the  values  of  a  MDP  is  equivalent  to  the  value  of  reaching  the  set 
of  states  with  value  1. 

3.1  Non-zero  sum  reachability  game 

In  sequel,  we  consider  stochastic  trees  Tj’j  such  that  Prg,7r(„4)  >  0.  Given 
a  stochastic  tree  Tj’j,  let  k  be  a  subset  of  nodes,  i.e. ,  k  C  Tr Analogous 
to  the  definition  of  reachability  and  safety  we  define  the  following  notions 
of  reachability  and  safety  in  the  stochastic  tree: 

1.  Reachability  in  tree.  For  a  set  n  C  Tr^,  let 

ReachTree(/c)  =  {(r)  |  r  is  an  infinite  path  in  Tr^  such  that  3i  6  M.  r,  G  k}, 
denote  the  set  of  paths  that  reach  the  subset  k  of  nodes. 

2.  Safety  in  tree.  For  a  set  k  C  TV^,  let 

SafeTree(ft)  =  {(r)  |  r  is  an  infinite  path  in  Tr^  such  that  Vi  £  N.  t*  £  n}, 

denote  the  set  of  paths  that  stay  safe  in  the  subset  k  of  nodes. 

Given  a  positive  integer  k  and  a  set  k  C  TV^,  we  define  by  ReachTreefc(fv)  = 

{  (r)  |  3  x  £  r.  3  i  <  k.  Xi  £  k  },  i.e.,  the  set  of  paths  that  reaches  k  within 
k  steps. 

Lemma  1  (Reachability  Lemma)  Let  ^ e  a  stochastic  tree. 

1.  For  a  set  k  C  Tr^;  if  infxgTra,7r  Pr (ReachTree(n)  \  A)  >  0,  then 
Pr f.''* (ReachTree(n)  \  A)  =  1,  for  all  nodes  x  £  Tr^,. 

2.  For  a  set  U  C  S,  if  infa.eTra,7r  PrJ,7r (Reach(U)  |  A)  >  0,  then 
Pi’x,7T (Reach(U)  \  A)  =  1,  for  all  nodes  x  £  Tr j^g. 

Proof.  We  prove  the  first  case  and  show  that  the  second  case  is  an  imme¬ 
diate  consequence. 
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1.  Let  0  <  c  <  inf^.g^.CT.Tr  PrJ,ir(ReachTree(ft)  |  A).  Chose  0  <  d  <  c.  For 

every  node  x  £  Tr^,  there  exists  kx  such  that  Pr£,7r(ReachTreefcx(K)  | 
A)  >  d .  Consider  k\  =  ke  (recall  that  e  is  the  root  of  the  tree)  and 
consider  the  frontier  F\  of  Tr^  at  depth  k\ .  Given  a  frontier  F  at 
depth  k,  let  F  be  the  set  of  nodes  x  in  F  such  that  the  path  from  the 
root  to  x  has  not  visited  a  node  in  k,  i.e. ,  none  of  e,x\,X2,  ■  ■  ■  ,ru  is 
in  k.  For  a  frontier  F%.  define  kl+\  =  max{fcx  |  x  £  Fq\.  Inductively, 
define  the  frontier  Fi+ \  at  depth  kj-  It  follows  that  for  k  = 

Ya=i^  we  ^ave  Prf,7r(^  \  Reach1 Treefc(fv)  |  A)  <  (1  —  c')n .  Since 
limn_>0O(l  —  d)n  =  0,  the  desired  result  follows  for  the  root  of  the 
tree.  Since  infx6Tra,7r  Pr£,7r  (Reach  Tree(ft)  |  A)  >  0,  it  follows  that  for 
all  node  x  £  Tr °j^s  we  have  inf,I.ieTro-,7r^  PrJ’^ReachTree^)  |  A)  >  0. 
Arguing  similarly  for  the  subtree  rooted  at  the  node  x  the  desired 
result  follows. 


Figure  1:  The  Stochastic  Tree  for  Reachability 


2.  Observe  that  with  k  =  {  x  £  Tr^  |  (x)  £  U  },  we  have  Reach (U)  = 
Reach1 Tree(ft).  The  result  is  immediate  from  part  1.  I 

Notations.  Let  A  C  kls  be  a  measurable  event  such  that  Prf,7r(A)  >  0. 
For  a  set  B  C  S,  let  InfSet(2?)  =  {c o  |  Inf(cu)  C  B}.  For  a  set  B  C  S,  let 
InfSetEq(5)  =  {ui  |  Inf(cu)  =  B}.  Given  a  node  x  in  Tr^,  and  e  >  0,  we 
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define  CaJ^e{x)  as  follows: 

C%*(x)  =  {BCS  \  PrJ,7r(InfSet(-B)  |  A)  >  1  -  £  }. 

Note  that  for  £1  >  0  and  £2  >  0,  such  that  £1  <  £2,  for  any  node  x  G  Tr^,, 
if  B  G  (x')  then  U  €  C^n£o(x).  We  define  by  C^,7r(x)  =  lim£_>o  C^(x). 
The  monotonicity  property  of  C^Z  with  respect  to  £  ensures  that  C^7r(x) 
exists  for  all  x  G  Tr^. 

Lemma  2  For  every  node  x  G  Tr^,  there  is  a  unique  minimal  element  of 
CaJ^  (x)  under  the  C  ordering. 

Proof.  Consider  a  node  x  G  Tr^,.  Let  B\  and  B2  be  two  distinct  minimal 
elements  in  C^{x).  Consider  any  arbitrary  £  >  0.  It  follows  from  the 
definition  that  we  have  PrJ,7r(InfSet(Hj)  |  *4)  >  1  —  for  i  G  {  1, 2  }.  By 
definition  we  must  have  Pr£,7r(InfSet(.E»i  U  £>2)  |  A)  <  1.  Hence  we  have  the 
following  equation: 

PrJ,5T(InfSet(f?i)  |  ^4)+PrJ’7r(InfSet(H2)  |  .4)-Pr£*((InfSet(.BinB2))  |  A)  <  1 

Hence  it  follows  that  Pr£’^((InfSet(£>i  n  H2))  |  _4.)  >1  —  £.  Hence  for  every 
£  >  0,  we  have  PrJ,7r(InfSet(Hi flB2)  |  A)  >  1  —  e.  Hence,  B1GB2  G  C^fn(x). 
However,  this  is  a  contradiction  to  the  assumption  that  B\  and  B2  are 
distinct  minimal  elements  of  C^,7r(x).  I 

We  define  the  function  ABjf  :  Tr^  — >  2s  that  assigns  to  every  node 
x  G  Tr^  the  minimum  element  of  C'^w(x).  Formally,  we  have 

Maf(x)=  n  B  =  lirri  f|  B. 

Bec%*(x)  ^  Bec%(x) 

Proposition  2  For  every  x  G  Tr J^s,  for  every  successor  x\  of  x  we  have 

Proof.  By  definition  for  any  nodes  x,  x\  G  Tr such  that  x\  is  a  successor 
of  x  we  have  Cj^{x  1)  C  C(^K{x).  The  result  is  an  easy  consequence  of  the 
above  fact.  I 

Lemma  3  Given  a  S-labeled  tree  TJ’J,  for  all  node  x  G  Tr^,  for  all  e  >  0, 
there  is  a  set  B  C  S,  and  x\  G  Tij^s(x),  such  that 

Pr£f(InfSetEq(B)  |  A)  >  1  -  £. 
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Proof.  The  proof  is  by  induction  on  \MaJ^  fx)\. 

Base  Case.  If  \M°a  (x)\  =  1,  let  (x)  =  {s}.  Then  for  all  nodes  x\  G 
Tr °AS(X')  we  ^ave  PrxfClnfSetds})  |  A)  >  1  —  e,  for  all  e  >  0.  Thus  for  all 
nodes  x\  G  Tr^(x),  for  all  e  >  0,  we  have  PrJ’17r(InfSetEq({s})  |  A)  >  1  —  e. 
Inductive  Case.  Suppose  there  exist  a  node  x\  G  Tr°j^s(x)  such  that 
C  then  \M.crj^ (X\)\  <  \AiaJK (x)\  and  the  result  follows 

by  inductive  hypothesis  at  x\.  Otherwise  for  every  node  x\  G  Tr°j^s(x) 
we  have  (x{)  =  (x).  Let  the  set  (x)  be  B.  We  have 

lim£_o  flxieTr^^x)  (riDeC^(xi)  D)  =  B- 


•  Suppose  we  have  infXieTr<T,7r^  PrJ’1?r(Reach({s})  |  A)  >  0,  for  all  states 


s  G  B.  Then  it  follows  from  Lemma  1  that  for  all  nodes  x\  G  Tr^(x) 
we  have  PrJ’1?r(Reach({s})  |  A)  =  1.  Hence  for  all  nodes  x\  G  Tr^(x) 
we  have  PrJ’17r(InfSetEq(R)  |  A)  =  1. 


•  Otherwise,  consider  a  state  s  G  B  such  that 
infa,1  gTr^, t  PrJ’17r (Reach ({  s  })  |  A)  =  0.  Hence  it  follows, 

for  every  e  >  0,  there  is  a  node  x\  G  Tr^(x)  such  that 
PrJ17r(InfSet(R  \  {  s  })  |  A)  >  1  —  e.  Formally,  we  have 

hme^0n.rieTr^>)  (fWj*  (*i )D)  ^  B  \  {  s  }■  This  is  a  con¬ 


tradiction  to  the  fact  that  for  all  nodes  x\  G 


Tr'J;7'„  (x)  we  have 


(T,7T 

l'i,s 

Ma/{x i)  =  B  (i.e.,  lim^ofUeTY^)  (HdgC^(* i)  =  5)- 


The  desired  result  follows.  I 


Lemma  4  For  ererj/  stochastic  tree  ,  /or  ewer?/  node  x  G  Tr^  one  of 
the  following  conditions  hold: 

1.  for  all  e  >  0,  there  is  a  node  x\  G  Tr^(x)  such  that  Pr/!/(He.s  |  A)  > 
1  -  e; 

2.  for  all  e  >  0,  there  is  a  node  x\  G  Tr^(x)  such  that  PrJ’]7r(H0S  |  .4)  > 
1  —  e. 


Proof.  It  follows  from  Lemma  3  that  for  all  e  >  0,  there  is  a  node  x\  G 
Tr^(x),  and  a  set  B  such  that  Pr  J’17r(InfSetEq(R)  |  A)  >  1—  e.  If  min(p(R)) 
is  even  then  condition  1  is  satisfied;  otherwise  condition  2  is  satisfied.  I 
We  now  show  that  solving  the  zero-sum  parity  game  is  equivalent  to 
computing  the  states  where  the  value  of  the  players  are  1  and  then  solv¬ 
ing  some  special  e-Nash  equilibrium  of  a  non-zero  sum  reachability  game. 
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Consider  a  game  graph  G  with  winning  objectives,  for  player  1  and 
for  player  2.  In  sequel  we  denote  by  W\  =  ((1  ))umit  and  W2  =  ((2 ))iimit. 
We  will  prove  that  if  both  the  player  play  one  of  their  perennial  e-optinral 
strategies,  with  e  — >  0,  then  the  probability  of  fle  being  satisfied  is  equal  to 
the  probability  of  reaching  W\  and  the  probability  of  being  satisfied  is 
equal  to  the  probability  of  reaching  W2  •  For  a  set  T  C  S  we  denote  by  T 
the  set  S\T.  Given  a  state  s  and  a  set  T  of  vertices  we  write  Safes(T)  to 
denote  Safe(T)  n  and  Reachs(T)  to  denote  Reach(T)  n  VLS. 

Lemma  5  (Reachability  with  e-optimal  strategies)  Given  a  game  G, 
consider  a  strategy  pair  ( a ,  ir)  G  Se  x  II£,  with  e  — ►  0.  For  all  states  s,  for 
all  node  x  G  Tr^,7r  we  have  Prf.’'* (Safes(Wi  U  W2))  =  0. 

Proof.  Let  0  <2  ■  rj  <  a  =  min{((l))„a;(fle)(s),  ((2))voi(fi0)(s)  | 
s  G  W\  U  W2},  i.e.,  a  is  the  least  positive  value  for  player  1  or 
player  2.  Consider  a  strategy  pair  {a,  n)  G  x  11^,  i.e.,  the  strate¬ 
gies  are  perennial  ^-optimal  strategies.  Let  Us'71  =  {x  G  Trf’,T  |  s  G 
W\  U  W2  and  PrJ,7r(Safes(IFi  U  W2))  >  0}.  If  t/f,7r  is  empty  the  de¬ 
sired  result  follows.  Assume  for  the  sake  of  contradiction  that  Us,7T  is 
non-empty.  Let  x  be  a  node  in  Us  ,7r  and  consider  the  S'-labeled  sub¬ 
tree  Ts'^^x)  rooted  at  x.  Since  Pr£,7r(Sa£es(Wi  U  W2))  >  0,  we  must 
have  Prjf  (Reach, (IFi  U  W2))  =  0.  Otherwise,  it  follows 

from  Lemma  1  that  if  infa.l6Tro-,7r,  n  PrJ’17r(Reachs(VFi  U  W2))  >  0,  then 
PrJ,7r(Reachs(IFi  U  W2))  =  1.  Since  infxieTV^(a;)  Pr£f  (Reach,  (Wi  U  W2))  = 
0  we  have  sup^^a,^,)  PrJ’17r(Safes(VFi  U  W2))  =  1.  Consider  a  node 
x\  G  Tr^,7r(x)  such  that  PrJ’17r(Safes(IFi  U  W2))  >  1  —  77.  Let  A  be  the 
event  Safes(VFi  U  W2).  Since  a  and  it  are  perennial  ^-optimal  strategy,  and 
Pr x’^(A)  >  1  —  77,  it  follows  that  for  every  node  x2  G  Tr^(xi)  we  have 

Pr f:jf{Fles  |  A)  >  ci  >  (a  —  2rf)  >  0  and  Pr fpff  (Flos  \  A)  >  c2  >  (a  —  2rf)  >  0. 

This  implies  that  for  all  node  x2  G  Tr^(xi)  we  have  Prf’if  {Ples  |  A)  <  1  —  c2 
and  Pr (Ft os  \  A)  <  1  —  ci.  It  follows  from  Lemma  4  that  for  every  e  >  0, 

there  is  a  node  x2  G  Trg,7r(xi)  such  that  either  Pr ^(fies  |  A)  >  1  —  e  or 

Pr TA^os  |  -4)  >  1  —  e.  Since  c\  and  c2  are  constants  greater  than  0,  we 
have  a  contradiction.  Hence  Uf,7T  =  0  and  the  Lemma  follows.  I 

Lemma  6  Given  a  game  G,  let  the  winning  objectives  of  player  1  and 
player  2  be  Ple  and  il0,  respectively.  Then 

linr  sup  inf  P  if.’*  {Reach  s(Wi))  =  ((1  ))vai{Ple){s) 

£^°<7es£  ffen£ 

linr  sup  inf  Prf’n  {Reachs{W2))  =  ((2))val(Q,0)(s) 

£^°7ren£  o'SSe 
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Proof.  Given  any  strategy  a  and  ir  we  have  the  following  equality: 

Pr^^es)  =  Pifn(ttes  n  Safe,(Wi  U  W2)) 

+  Prf  n(ttes  n  Reach, (Wi  U  W2)) 

It  follows  from  the  definition  of  e-optimal  strategies  and  determinacy 
of  parity  games  [15,  8]  that  for  all  state  s  we  have  ((l))„az(fle)(,s)  = 
linm^o  supg-gs^  inf^gn^  Pr^,7r(fles)-  For  any  state  s  have  the  following  con¬ 
tainment  relation:  Ples  fl  Safe,(VPi  U  W2)  C  Safe,(Il/i  U  W2).  It  follows 

from  Lemma  5  that  lirne _ sup^^g^x^  Pr^,7r(fles  fl  Safe,(Wi  U  IP2))  =  0. 

Hence  we  have 

((l))ua/(He)(s)  =  lim  sup  inf  Pr^7r(He,  fl  Reach, (W±  U  W2)) 

£^o  o-ese  7rene 

Since  a  and  7 r  are  e-optimal  strategies  we  have  the  following  two  facts: 

lim  sup  inf  Pr^,7r(He,  n  Reach, (IF2))  =  0 
£  *0  <tge£ 

lim  sup  inf  Pr^,7r(He,  |  Reach  s  ( Hj ))  =  1. 

£^°o-ese  7rene 

This  gives  us  the  following  equality: 

((l))„a;(He)(s)  =  lim  sup  inf  Pr^’7r(He,  fl  Reach,  (Wi)) 

£^°  <res£  7>-ene 

The  right  hand  side  of  the  equality  can  be  expressed  as 

lim  sup  inf  Pr^,7r(He,  |  Reach,  (W\))  Pr^,7r  (Reach,  (ITi)) 

=  lim  sup  inf  Pr(T’7r (Reach, (ITi)). 

£^°o-es£ 

This  gives  us  the  desired  result.  I 

Consider  the  following  variants  of  the  game  G,  a  game  Ga  and  Gr  as 
follows,  with  the  same  state  space  as  G  and  the  states  in  W\  and  W2  changed 
to  absorbing  states.  Ga  is  a  zero-sum  parity  game  and  the  priority  for  each 
state  in  W\  is  0  and  for  each  state  in  W2  is  1,  and  for  all  the  other  states  is 
same  as  the  priority  of  the  game  G.  Note  that  for  every  state  s  the  value  for 
player  1  and  player  2  for  the  game  G  and  Ga  are  the  same.  The  game  Gr 
is  a  non-zero  sum  reachability  game  and  the  winning  objectives  of  both  the 
players  are  reachability  objectives:  the  objective  for  player  1  is  Reach(VPi) 
and  the  objective  for  player  2  is  Reach  (If -2).  Note  that  the  game  Gr  is 
not  zero-sum  in  the  following  sense:  there  are  infinite  paths  oj  such  that 
u ;  0  Reach  (ITi)  and  u>  0  Reach  (If^)  and  each  player  gets  a  payoff  0  for 
the  path  oj.  We  define  e-Nash  equilibrium  of  the  game  Gr  and  relate  some 
special  e-Nash  equilibrium  of  Gr  with  the  values  of  G. 
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Definition  6  (e-Nash  equilibrium  in  Gr)  A  strategy  profile  ( a*,n *)  6 
E  x  n  is  an  e-Nash  equilibrium  at  state  s  if  the  following  two  conditions 
hold: 


Vex  €  E.  Prf (Reachs{Wi))  >  Pr*’** {Reachs{W1))  -  e 

Vtt  €  n.  Pr f  (Reacha(W2))  >  Pr f (Reachs(W2))  -  e  I 

Theorem  1  (Nash  equilibrium  of  reachability  game  Gr  associated 
with  the  parity  game  G)  The  following  assertion  hold  for  the  game  Gr. 

1.  For  all  e  >  0,  there  is  an  e-Nash  equilibrium  (a*,  tt*)  such  that  for  all 
states  s  we  have 

lim  Pr  ft  (Reachs(Wi))  =  «1  ))vai{^e)(s) 

£—>0 

limPr  $'<(Reach8(W2))  =  {(2))val(no)(s). 

£ — >0 

Proof.  It  follows  from  Lemma  6  and  Proposition  1.  I 

4  Strategy  Characterization  and  Computational 
Complexity 

In  this  section  we  construct  polynomial  witnesses  for  perennial  e-optimal 
strategies  and  describe  polynomial  procedure  to  verify  the  witnesses.  An 
immediate  consequence  is  the  fact  that  the  values  of  concurrent  parity  games 
can  be  decided  within  e-precision  in  NP  n  coNP.  Since  the  values  can  be 
irrational,  one  can  only  hope  to  e-approximate  the  values.  Our  proof  tech¬ 
niques  reveals  several  key  characteristics  of  the  perennial  e-optimal  strate¬ 
gies.  In  general  perennial  e-optimal  strategies  require  infinite  memory  in 
general  [6,  8].  We  show  that  though  the  perennial  e-optimal  strategies  re¬ 
quire  infinite  memory  in  general,  there  exist  perennial  e-optimal  strategies 
that  in  limit  coincide  with  some  nrenroryless  strategies.  This  result  parallels 
with  the  celebrated  result  of  Mertens-Neyman  [17]  for  concurrent  games  with 
limit-average  objectives,  that  states  there  exists  e-optimal  strategies  that  in 
limit  coincide  with  some  nrenroryless  strategies  (the  nrenroryless  strategy  cor¬ 
respond  to  the  nrenroryless  optimal  strategies  in  the  discounted  game  with 
discount  factor  tends  to  0).  It  may  be  noted  that  the  nrenroryless  strate¬ 
gies  that  the  perennial  e-optinral  strategies  coincide,  is  itself  not  necessarily 
e-optinral. 
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In  concurrent  games  with  safety  objective  optimal  nrenroryless  strategies 
exist,  and  the  optimal  strategies  in  general  require  randomization  [11],  In 
case  of  concurrent  games  with  reachability  objectives  optimal  strategies  need 
not  exist,  but  nrenroryless  e-optimal  strategies  exist  for  all  e  >  0  [11],  and  the 
e-optimal  strategies  require  randomization.  In  case  of  concurrent  games  with 
Biichi  objectives,  e-optinral  strategies  require  infinite  memory  in  general  [6]. 
In  contrast  we  show  that  for  all  e  >  0,  nrenroryless  e-optinral  strategies 
exit  for  all  concurrent  games  with  coBiichi  objectives.  It  follows  from  the 
simpler  case  of  reachability  objectives  that  optimal  strategies  need  not  exist 
and  e-optinral  strategies  require  randomization.  It  follows  from  the  results 
on  Biichi  objectives  that  in  concurrent  games  with  parity  objectives  with  3 
or  more  priorities  e-optinral  strategies  require  infinite  memory  in  general. 
Our  result  thus  completes  the  precise  memory  requirements  of  e-optinral 
strategies  in  concurrent  parity  games. 

4.1  Reduction  to  Qualitative  Witness 

The  notion  of  local  optimality  will  play  an  important  role  in  our  construction 
of  polynomial  witnesses.  Informally,  a  selector  function  £  is  locally  optimal 
if  it  is  optimal  in  the  one-step  matrix  game  where  each  state  is  assigned 
a  reward  value  ((l))vai{Lle)(s).  A  locally  optimal  strategy  is  a  strategy  that 
consists  of  locally  optimal  selectors.  A  locally  e-optimal  strategy  is  a  strat¬ 
egy  that  has  a  total  deviation  from  locally-optinral  selectors  of  at  most  e. 
Locally  optimal  selectors  and  strategies  play  a  role  in  the  construction  of 
polynomial  witnesses,  since  local  optimality  is  a  notion  that  can  be  checked 
in  polynomial  time. 

We  note  that  local  e-optimality  and  e-optimality  are  very  different  no¬ 
tions.  Local  e-optimality  consists  in  the  approximation  of  a  local  selector;  a 
locally  e-optimal  strategy  provides  no  guarantee  of  yielding  a  probability  of 
winning  the  game  close  to  the  optimal  one.  On  the  other  hand,  a  e-optimal 
strategy  is  a  strategy  that  guarantees  a  probability  of  winning  close  to  the 
optimal  one;  there  are  no  constraints  on  its  local  structure.  The  construc¬ 
tion  of  polynomial  witnesses  will  depend  on  constructing  a  relation  between 
the  notion  of  local  e-optinrality  (which  is  polynomially  checkable)  and  global 
e-optimality  (which  yields  a  value  close  to  the  value  of  the  game). 

Definition  7  (Locally  e-optimal  selectors  and  strategies)  A  selector 
£  is  locally  optimal  if  for  all  s  G  S  and  a 2  G  Ts(s)  we  have 

1)  |  S,£(s),a2]  >  ((l»„aX^e)(s)- 
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We  denote  by  Ae  the  set  locally- optimal  selectors. 

A  strategy  a  is  locally  optimal  if  for  every  history  (sq,  si,  •  •  • ,  sk)  we  have 
cr(so,  si,  •  •  • ,  Sfc)  G  A£,  i.e.,  player  1  plays  a  locally  optimal  selector  at  every 
stage  of  the  play.  We  denote  by  the  set  of  locally  optimal  strategies. 

A  strategy  a£  is  locally  ^-optimal  if  for  every  strategy  f  £  II  and  for 
every  co  =  (so,  si,  S2,  ■  ■  ■ , )  G  Outcome(s,  a£,  tt)  we  have 

OO 

^2  (max{(((l))voJ(ne)(sfc)-E[((l))voi(ne)(0fc+i)  |  sk,  a£(uk),  7r(wfe)]),  0})  <  e, 
k= 0 

where  uk  =  (sq,  s±, . . . ,  sk) ■  We  denote  by  Xf  the  set  of  locally  e-optimal 
strategies.  I 

A  value  class  of  the  game  is  the  set  of  all  states  where  the  game  has  a  given 
value. 

Definition  8  (Value  class)  A  value  class  VC (r)  is  the  set  of  states  s  such 
that  the  value  for  player  1  is  r.  Formally,  VC (r)  =  {s  |  ((l))vai(Qe)(s)  =  r}. 
Note  that  for  any  game  there  are  at  most  |5|  many  value  classes.  By  VC<r 
we  denote  the  set  {s  |  ((1)) vai(Qe)(s)  <  r}  and  similarly  we  use  VC>r  to 
denote  the  set  {s  |  ((l))m/(^e)(s)  >  r}.  I 

Intuitively,  we  can  picture  the  game  as  a  “quilt”  of  value  classes.  Two  of  the 
value  classes  correspond  to  values  1  (player  1  wins  with  probability  arbitrar¬ 
ily  close  to  1)  and  0  (player  2  wins  with  probability  arbitrarily  close  to  1); 
the  other  value  classes  correspond  to  intermediate  values.  We  construct  a 
polynomial  witness  in  a  piece-nreal  fashion.  We  first  show  that  we  can  con¬ 
struct,  for  each  intermediate  value  class,  a  strategy  that  with  probability 
arbitrarily  close  to  1  guarantees  either  leaving  the  class,  or  winning  without 
leaving  the  class.  Such  a  strategy  can  be  constructed  using  results  from  [6], 
and  has  a  concise  (polynomial)  witness.  Second,  we  show  that  the  above 
strategy  can  be  constructed  so  that  when  the  class  is  left,  it  is  left  via  a  lo¬ 
cally  e-optimal  selector.  By  stitching  together  the  strategies  constructed  in 
this  fashion  for  the  various  value  classes,  we  will  obtain  a  single  polynomial 
witness  for  the  complete  game.  The  construction  of  a  strategy  in  a  value 
class  relies  on  the  following  reduction. 

Reduction.  For  a  state  s  we  define  the  set  of  allowable  actions  as  follows 

Allow Acts(s)  =  {7  C  Ti(s)  :  such  that  there  is  an  optimal  selector 

G  Ae  and  Supp(^)  =  7} 
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Let  G  =  (S,  Moves ,  Ti ,  T2,  <5)  be  a  concurrent  game  with  parity  objectives 
Parity  (p)  and  coParity  (p)  for  player  1  and  player  2  respectively,  and  let  the 
priority  function  be  p.  Consider  a  value  class  VC  (r)  with  0  <  r  <  1. 

We  construct  a  concurrent  game  Gr  =  (Sr,  Moves,  Ti,T2,S)  with  a  priority 
function  p  as  follows: 

1.  State  space.  Given  a  state  s  let  Allow Acts(s)  =  {71, 72,  •  •  • ,  7fc}- 
Then  we  have 

Sr  =  {  s  |  s  G  VC(r)  }  U  {  w\,W2  } 

U{  (S,  i)  |  s  G  VC (r),i  G  {  1, 2, . . . ,  k  }  and  Allow Acts(s)  =  {71, 72,  •  •  •  ,7 k}  } 

2.  Priority  function. 

(a)  p(s)  =  p(s )  for  all  s  G  VC(r). 

(b)  p((s,i))  =  p(s)  for  all  (s,i)  G  Sr. 

(c)  p(w  1)  =  0  and  p(w 2)  =  1. 

3.  Moves  assignment. 

(a)  ri(i)  =  {  1, 2, . . . ,  k  }  such  that  Allow Acts(s)  =  {71, 72,  •  •  • ,  7fc} 
and  r2(s)  =  {02}.  Note  that  every  S  G  Sr  is  a  player-1  turn-based 
state. 

(b)  ri((s,i))  =  {i}U(ri(s)\7i)  and  T2{{s,i))  =  r2(s).  At  state  (s,i) 
all  the  moves  in  7 j  are  collapsed  to  one  move  i  and  the  moves  not 
in  7j  exist  in  the  set  of  available  moves. 

4.  Transition  function. 

(a)  The  states  w\  and  u>2  are  absorbing  states.  Observe  that  player  1 
have  value  1  and  0  at  state  w  1  and  W2  respectively. 

(b)  For  any  state  s  we  have  S(s,i,a2)((s,i))  =  1.  Hence  at  state  s 
player  1  can  decide  which  element  in  Allow Acts(s)  to  play  and  if 
player  1  chooses  move  i  the  game  proceed  to  state  (s,  i) . 

(c)  Transition  function  at  state  ( s,i ). 

i.  For  any  move  02  G  r2(s),  if  there  is jl  move  a\  G  7 j  such  that 
Es'^vc(r)  d'(s>ai>a2)(s')  >  0,  then  8((s,i),i,a2)(w1)  =  1. 

The  above  transition  specifies  that  if  for  a  move  02  for 
player  2  and  a  move  ai  G  7 ,  for  player  1,  if  the  game  G  pro¬ 
ceeds  to  a  different  value  class  with  positive  probability  then 
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Figure  2:  Reduction  to  limit-sure 


in  Gr  the  game  proceeds  to  the  state  wi,  which  has  value  1 
for  player  1,  with  probability  1.  Note,  that  since  ai  £  7 ,  and 
7 i  £  Allow Acts(s),  if  in  G  the  game  proceeds  to  a  different 
value  class  with  positive  probability  it  also  proceeds  to  VC>r 
with  positive  probability. 

ii.  For  any  move  02  £  r2(s),  if  for  every  move  ai  £  7 *  we  have 
Es'eVC(r)<)'(s)«i>a2)(s/)  =  1  then 

5((s,i),i,a2)(s')  =  ^  Ci(«i)  '  a2)(s') 

«l  67i 

where  is  an  locally  optimal  selector  with  Supp(£f)  =  7 

iii.  For  any  move  ai  £  (Ti(s)  \  7,),  for  any  move  02  £  r2(s)  we 
have: 


(S((s,i),ai,a2)(s')  =  S(s,  ai,  0,2) (s')  for  s'  £  VC (r); 
S((s,  i),ai,  a2){w2)  =  ^  S(s,  ai,  a2)(s'). 

s'gVC(r) 


The  reduction  is  illustrated  in  Fig.  2. 

Fact  1.  If  player  1  follows  a  strategy  a£  such  that  at  any  state  (s,  i)  it  plays 
action  i  with  probability  1  then  for  every  strategy  ir  for  player  2  we  have 
Pr~£  ,7T  (Reach  (W2))  =  0. 

Lemma  7  For  every  r  >  0,  for  any  state  s  £  VC(r),  the  state  s  is  limit- 
sure  winning  in  the  game  Gr  for  player  1,  i.e.,  from  state  s  player  1  can  win 
with  probability  arbitrarily  close  to  1. 
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Proof.  Let  a£  be  a  locally  e-optimal  and  perennial  e-optimal  strategy  in  G, 
i.e.,  a£  £  T,£  n  (the  fact  that  (1  Sc  /  0  follows  from  the  results  of  [8]). 
Assume  for  the  sake  of  contradiction  that  U  C  Sr  D  {  s  |  s  6  VC(r)  }  is  a 
non-empty  set  such  that  player  2  wins  with  bounded-positive  probability. 
Let  J  be  a  perennial  bounded-positive  optimal  strategy  for  player  2  from 
the  set  U.  We  construct  a  projected  strategy  a£  for  player  1  in  Gr  and  a 
extended  strategy  -k£  for  player  2  in  G  as  follows: 

1.  Strategy  a£  in  the  game  Gr: 

•  a£(s0,  (s0,  i0),  si ,  (si,  ii), . . . ,  =  1  if  and  only  if  7,-  = 

arg max7gAuowActs(Sfc)  ®i>  •  •  •  ,  ®fc)(®)- 

•  ^e(®0) (®0j *0) j > (®1 > *l) 5  •  •  •  !  &k> (®fe>  J)) (j) 

Sae7ja'£('so)'Si,...,Sfc)(a)  and  for  all  a'  0  7j  we 

have  <T£(s'o,(s'o,*o),si,(si,n),---,'Sfe,(sfc,i))(a/)  = 

^e(®0j  ®1,  •  •  •  1  ^fc)(®  )• 

2.  Strategy  7r£  in  the  game  G: 

•  7Te(so,si,.  •  •  ,sk)  =  7f(so,  («o,  ^o),  si,  (si,  «i),  •  •  ■ ,  $k))  such  that  for 
all  0  <  l  <  k,  we  have  a£(s0,  (s0,  i0),  si,  (si,  h),  ■  ■  ■ ,  si))(ii)  =  1- 

Given  a  set  of  states  C  C  Sr  \  {  w\ ,  W2  }  we  denote  by  Cq  =  {  s  | 
s  £  C  or,  for  some  i.  ( s,i )  £  C  }.  Suppose,  for  some  state  s  we  have 
Pr~e,7r(Safes(C1))  >  0,  for  some  set  C  C  SV  \  {  W\,W2  }.  Then  by  con¬ 
struction  of  7re  we  have  PpL’71^  (Safes  (Cg))  >  0.  It  follows  from  argu¬ 
ment  similar  to  Lemma  5  that  there  is  a  node  x  £  TLL,7re  such  that 
PrJ7r(Safes(C,G))  >  1  —  s',  with  s'  —>  0.  Let  us  denote  by  A  the  event 
Safes(C'G').  Note  that  for  event  A,  the  strategy  pair  (cr£,7re)  is  well-defined. 
Since  a£  is  a  perennial  e-optimal  strategy,  for  all  nodes  x\  £  Tr^  '*e  (x) 
we  have  Pr^,7re(flos  |  A)  <  C2,  for  C2  <  1.  Since  n  is  a  perennial 
bounded  positive  optimal  strategy  in  Gr  for  all  nodes  x\  £  Tr ^’^e(x)  we 
have  Prjj,7re(^es  |  A)  <  c\  and  for  c\  <  1.  However,  this  is  a  contra¬ 
diction  to  Lemma  4.  Hence  for  every  state  s  £  Sr\{w i,tC2  }  we  have 
Pr~c,7r(Safes(5r  \  {  w\,  W2  }))  =  0,  and  hence  Pr~s,7r (Reached  w\,W2  }))  =  1. 
Since,  a£  £  it  follows  from  the  construction  of  the  game  Gr,  Fact  1.  and 
the  property  of  locally  e-optimal  strategies  that  Pr~s,7r (Reached  to 2  }))  <  e. 
Thus  Pr ~s,v(Qe  fl  ftg)  >  Pr~e,7r (Reached  w\  }))  >  1  —  e.  This  is  a  contradic¬ 
tion  to  the  assumption  that  7 f  is  bounded  positive  optimal.  I 


22 


Limit-sure  witness  [6].  The  witness  strategy  a  for  a  limit-sure  game 
constructed  in  [6]  consists  of  the  following  parts:  a  ranking  function  of 
the  states,  and  a  ranking  function  of  the  actions  at  a  state.  The  ranking 
functions  were  described  by  a  //-calculus  formula.  The  witness  strategy  a 
at  round  A;  of  a  play,  at  a  state  s,  plays  the  actions  of  the  least  rank  at  s 
with  positive-bounded  probabilities  and  other  actions  with  vanishingly  small 
probabilities  (as  function  of  e),  in  appropriate  proportion  as  described  by 
the  ranking  function.  Hence,  the  strategy  a  can  be  described  as 

a  =  (1  —  £k)ai  +  £k  '  &d(£k), 

where  07  is  any  selector  with  £  such  that  Supp(£)  is  the  set  actions  with 
least  rank,  and  crd(£k)  denotes  a  selector  with  Supp(<7d(£fc))  =  Ti \Supp(<r^). 
Hence  the  strategy  a  plays  the  moves  in  Supp(<7d(£fc))  with  vanishingly  small 
probability  as  £&  — ►  0.  We  denote  by  limit-sure  witness  move  set  the  set 
of  actions  with  the  least  rank,  i.e.,  Supp(<7/).  It  follows  from  the  above 
construction  that  as  e  — >  0,  the  limit-sure  winning  strategy  a  converges  to 
the  memory  less  selector  07,  i.e.,  the  limit  of  the  limit-sure  witness  strategy 
is  a  nrenroryless  strategy. 

Lemma  8  At  any  state  ( s,i ),  if  the  limit-sure  witness  move  set  for  player  1 
is  7,  if  (7  \  {  i  })  /  0,  then  (7  \  {  *  })  G  Allow  Acts(s). 

Proof.  Consider  a  move  a  £  7  \  {  i  }.  If  there  is  a  move  b  £  ^(s)  such  that 
6(('s,i),a,b)(w2)  >  0,  we  would  obtain  a  contraction  to  the  hypothesis  that 
player  1,  at  (s,  i),  wins  with  probability  arbitrarily  close  to  1.  Hence,  we 
have  Dest(s,  a,  b )  C  VC(r)  for  every  move  b  £  ^(s),  leading  to  the  result.  I 

Lemma  9  (Union-closure  of  Allow Acts(s))  For  all  state  s,  if  71  £ 

Allow Acts(s)  and  72  £  Allow Acts(s),  then  71  U  72  £  Allow Acts(s). 

Proof.  It  follows  from  the  properties  of  “one-step”  matrix  games  that  if  £1 
and  £2  are  optimal  strategies  for  a  player,  then  any  convex  combination  of 
£1  and  £2  is  also  an  optimal  strategy.  Thus  it  follows  that  if  £1  £  A(:  and 
£2  £  A  ,  then  there  exist  £  £  Ae  such  that  Supp(£)  =  Supp(£i)  U  Supp(£2). 
The  lemma  follows.  I 

Lemma  10  At  any  state  (s,  i),  if  the  limit-sure  witness  move  set  for  player  1 
is  7,  then  7 j  =  ((7  \  {  i  })  U  7*)  £  AllowActs(s). 

Proof.  The  Lemma  follows  from  Lemma  8  and  Lemma  9.  I 
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Lemma  11  For  every  state  s  there  is  a  pure  memoryless  move  j  for  player  1 
and  limit- sure  winning  strategy  a  such  that  Supp(<r)(3)  =  {j}  and  the  limit- 
sure  witness  move  set  at  (’ s,j )  =  {  j  }. 

Proof.  The  existence  of  pure  memoryless  move  is  a  consequence  of  the  fact 
that  every  state  s  is  a  player- 1  turn-based  state  and  the  witness  construction 
in  [6].  The  rest  follows  from  Lemma  10.  I 

Definition  9  (Value-class  qualitative  optimal  strategy)  A  strategy 
a£  is  a  value-class  qualitative  optimal  strategy  for  a  value-class  VC(r),  with 
0  <  r  <  1,  if 

1.  a£  is  locally  e-optimal. 

2.  Let  it  be  an  arbitrary  strategy  for  player  2.  For  a  state  s  €  W\  U  W2, 
for  all  node  x  in  Tr^£,7r  such  that  ( x )  G  VC(r),  Pr f.e,1T(Qes  \ 
Safe(VC(r)))  >  1  -  e. 

A  strategy  a£  is  value-class  qualitative  optimal  if  it  is  value-class  qualitative 
optimal  for  all  value  class  0  <  r  <  1.  I 

The  existence  of  value-class  qualitative  optimal  strategies  follows  from 
Lemma  7  and  Lemma  11. 

Lemma  12  The  set  of  value- class  qualitative  optimal  strategies  is  non¬ 
empty. 

Lemma  13  Let  a£  be  a  locally  e-optimal  strategy.  For  all  strategy  1 r  for 
player  2,  for  all  node  x  G  Trg£,7r,  if  Pr££,1T (Reach(W\  U  W2))  =  1,  then 
Prfe’n(Reach(Wi))  >  ({1  ))vai(Cle)((x))  -  e. 

Proof.  The  results  then  follows  from  the  fact  that  the  sequence 
(((l))voi(^e)(@i))i  a  sub-martingale  under  ae  and  n.  I 

The  following  Lemma  shows  that  the  value-class  qualitative  optimal 
strategies  for  different  value  classes  can  be  “stitched”  or  composed  together 
to  produce  a  perennial  e-optimal  strategy.  This  will  allow  us  to  produce 
witness  for  individual  value  classes  and  compose  them  to  obtain  a  witness 
for  perennial  e-optimal  strategy. 

Lemma  14  (Stitching  Lemma)  Let  a£  be  a  value-class  qualitative  opti¬ 
mal  strategy  and  perennial  e-optimal  for  all  state  in  W\.  Then  a£  is  a 
perennial  e-optimal  strategy. 
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Proof.  Consider  any  strategy  7r  for  player  2  and  consider  the  stochastic  tree 
TsE ,7r  for  any  state  s.  For  a  node  x  we  define  the  set  SafeVal(x)  =  {  u>  = 
(so,si, . . . ,)  £  Cone(x)  |  V  k  >  |x|.Sfc  £  VC(r),  where  ( x )  £  VC(r)  }  as  the 
set  of  paths  that  stays  safe  in  the  value  class  VC(r)  of  ( x )  from  x.  Note 
that  Cone(x)  \SafeVal(x)  denotes  the  set  of  paths  that  leaves  the  value  class 
VC(r)  from  x.  Let  a  =  nrax{  ((l))vai(Qe)(s)  \  s  £  (S  \  W\  },  i.e. ,  a  is  the 
maximum  value  for  player  1  that  is  less  than  1.  Consider  the  following  set 
of  nodes 

K\  =  {  x  £  TLL,7r  |  Pr££,w(SafeVal(.x)))  >  a  } 

K2  =  {  x  £  Tr^e,7r  |  Pr££,7r(Cone(x)  \  SafeVal(x))  >  1  —  a  } 

Note  that  k\  =  Trf£,7r  \  n2  and  hence  for  any  node  x  £  Tr °jfs  we  have 
Pr££,ir(ReachTree(fti))  +  Pr££,7r(SafeTree(ft2))  =  1-  Consider  the  event  A  = 
SafeTree^)-  Since  cre  is  a  locally  e-optimal  strategy  it  follows  that  if  a 
play  leaves  a  value  class  VC(r)  with  probability  at  least  (1  —  a)  >  0,  then 
it  reaches  VC>r  with  positive  bounded  probability.  It  follows  that  n2  F 
{  x  |  Pr££,1T (Reach (IFi  U  W2))  >  c  >  0  }.  Hence,  it  follows  that  for  all  node 
x  £  TrJ we  have  infxlgTr/>s,7r  Pr^£,7r (Reach [W\  U  W2 )  |  »4)  >  0.  It  follows 

from  Lemma  1  that  for  all  node  x  £  Tr^6  ^  we  have  Pr££,7r(Reach(IFi  U  W2)  \ 

A)  =  1.  Since  ae  is  locally  e-optinral,  it  follows  from  Lemma  13  that 

Pr  I  Reach  (IF!  U  W2))  >  PrJ£’w(Reach(IFi))  >  «l»™K^e)((x»  -  e. 

Since  ae  is  a  value-class  qualitative  optimal  strategy  we  have  Pr^£’7r(Hes  | 
Safe(VC(r))  >  (1  —  e).  Therefore,  for  all  node  x  in  n\  we  have  PrJ£,7r(fles)  > 
a  ■  (1  —  e)  >  a  —  e,  since  a  <  1.  Thus  for  all  node  x  we  have,  PrJ£,7r(Hes  | 
ReachTree(Ki))  >  a  —  e.  For  all  node  x  we  have 

PrJ£,,r(Hes)  >  PrJ£,7r(Hes  |  SafeTree(K2))  •  Pr^£,7r(SafeTree(K2)) 

+  Pr Je,,r(Hes  |  ReachTree(Ki))  •  Pr££,,T(ReachTree(fti)) 

>  (((1  ))val(^e)((x))  -  e)  •  PrJ£’7r(SafeTree(K2)) 

+  (a  —  e)  •  PrJ£,7r(ReachTree(Ki)) 

Since  ((l))vai(Qe)((x))  <  a  we  have  PrJ£,7r(Hes)  >  ((1  ))„a/(fie)((x))-e.  Hence 
ae  is  a  perennial  e-optinral  strategy.  I 

The  following  Theorem  follows  from  existence  of  nrenroryless  limit-sure 
winning  strategies  for  concurrent  games  with  coBiichi  objectives  [6]  and  the 
existence  of  perennial  e-optinral  strategies  obtained  by  composing  value-class 
qualitative  optimal  strategies  across  value  classes  (Lemma  14). 

Theorem  2  (Memoryless  e-optimal  strategies  for  coBiichi  objectives) 

For  every  e  >  0,  memoryless  e-optimal  strategies  exist  for  all  coBiichi 
objectives  on  all  concurrent  games. 
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The  following  Theorem  states  that  there  exist  perennial  e-optinral  strate¬ 
gies  that  in  limit  coincide  with  locally  optimal  selector,  i.e. ,  a  nrenroryless 
strategy  with  locally  optimal  selectors.  This  parallels  the  results  of  Mertens- 
Neyrnan  [17]  for  concurrent  games  with  limit-average  objectives. 

Theorem  3  (Limit  of  e-optimal  strategies)  For  every  e  >  0,  there  ex¬ 
ist  perennial  e-optimal  strategy  ae  G  Ee,  such  that  the  sequence  of  the  strate¬ 
gies  ae  converge  to  a  locally  optimal  selector  a  as  e  —>  0,  i.e.,  lim^o  ae  =  a1, 
where  7T  G  Yf  and  a  is  memoryless. 

Proof.  For  arbitrary  e  >  0,  consider  the  perennial  e-optimal  strategy  a£ 
constructed  as  a  value-class  qualitative  optimal  strategy.  The  fact  that 
the  value-class  qualitative  optimal  strategy  is  a  perennial  e-optimal  strategy 
follows  from  Lemma  14.  The  result  then  follows  from  Lemma  11  and  the  fact 
that  the  limit-sure  winning  strategies  coincide  in  limit  with  a  nrenroryless 
selector  an  such  that  Supp(oy)  is  the  set  of  least-rank  actions  of  the  limit-sure 
witness.  I 

Witness  for  perennial  e-optimal  strategies.  The  witness  for  a  peren¬ 
nial  e-optinral  strategy  ae  is  presented  as  a  value-class  qualitative  optimal 
strategy  (recall  Lenrnra  14).  The  existence  of  a  value-class  qualitative  op¬ 
timal  strategy  is  guaranteed  by  Lenrnra  12.  The  witness  consists  of  the 
linrit-sure  winning  strategy  witness  in  the  game  Gr,  for  all  0  <  r  <  1,  and 
of  a  locally  e-optinral  strategy.  The  witness  can  be  described  as  follows: 

•  Limit-sure  witness.  The  limit-sure  witness  in  the  game  Gr,  for  r  >  0, 
is  constructed  as  the  the  witness  described  in  [6].  Observe  that  the 
game  Gr  can  be  exponential  in  the  size  of  the  game  G,  since  the 
set  Allow  Acts  (s)  can  be  exponential.  To  obtain  efficient  polynomial 
witness  we  make  the  following  key  observation:  at  every  state  s  there 
is  a  pure  nrenroryless  move  i  for  player  1  (Lenrnra  11)  in  the  linrit- 
sure  witness  strategy.  Hence  player  1  constructs  a  game  G'r  such  that 
every  state  s  there  is  only  a  single  successor  (s,i),  where  i  is  a  pure 
nrenroryless  move  in  the  limit-sure  witness  in  Gr.  The  graph  G'r  is 
linear  in  the  size  of  the  game  G.  The  witness  in  state  (s,  i )  is  the 
witness  as  described  in  [6] :  the  witness  consists  of  a  ranking  function 
of  the  actions  and  a  ranking  function  of  the  state  space.  The  witness  is 
polynomial  and  can  be  verified  in  polynomial  time  in  size  of  the  game 
graph. 

•  Locally  e-optimal  witness.  The  locally  e-optinral  witness  consists  of 
the  following: 
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1.  The  values  of  the  game  at  every  state  s,  within  e  precision. 

2.  The  locally  optimal  selector  W  £  E  .  Note  that  the  selector  a 
may  specify  probabilities  that  are  irrational.  The  locally  optimal 
selector  a  is  e- approximated  by  a  /c-uniform  selector  Wk,  where  a 
/c-uniform  selector  is  a  selector  such  that  the  associated  probabil¬ 
ities  of  the  distribution  are  multiple  of  p  It  follows  from  [4,  14], 
that  k  is  polynomial  in  the  size  of  the  game  graph  and  7.  The 
strategy  cffc  must  satisfy  the  constraint  that  Supp(ufc)  is  exactly 
the  set  of  actions  with  the  least  rank  as  described  by  the  limit- 
sure  witness.  The  verification  of  the  witness  can  be  achieved  in 
polynomial  time,  since  checking  local  optimality  involves  verify¬ 
ing  that  Ufc  is  optimal  for  the  “one-step”  game  with  respect  to 
the  values  at  every  state. 

It  follows  from  above  that  there  are  polynomial  witness  for  perennial  e- 
optimal  strategies  and  the  witness  can  be  verified  in  polynomial  time.  This 
shows  that  the  values  of  concurrent  parity  games  can  be  decided  with  in 
e-precision  in  NP.  Since  concurrent  parity  games  are  closed  under  comple¬ 
mentation  the  decision  procedure  is  also  in  coNP.  This  gives  us  the  following 
Theorem. 

Theorem  4  (Computational  complexity  of  concurrent  parity  games) 

For  all  constant  e  >  0, 

1.  for  all  rational  r,  whether  ((l))„a;(fle)(s)  >  r  —  e  can  he  decided  in  NP 
n  coNP. 

2.  the  value  functions  ((1  }}vai(£le)  and  ((2))vai(Q0)  can  be  approximated 
within  e-precision  in  time  exponential  in  G  and  polynomial  in  p 

The  previous  best  known  algorithm  to  approximate  values  is  triple  ex¬ 
ponential  in  the  size  of  the  game  graph  and  logarithmic  in  7  [8]. 
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